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Abstract 

The non-negative solution to an underdetermined linear system can be uniquely recovered 
sometimes, even without imposing any additional sparsity constraints. In this paper, we 
derive conditions under which a unique non-negative solution for such a system can exist, 
based on the theory of polytopes. Furthermore, we develop the paradigm of combined 
sparse representations, where only a part of the coefficient vector is constrained to be non- 
negative, and the rest is unconstrained (general). We analyze the recovery of the unique, 
sparsest solution, for combined representations, under three different cases of coefficient 
support knowledge: (a) the non-zero supports of non-negative and general coefficients 
are known, (b) the non-zero support of general coefficients alone is known, and (c) both 
the non-zero supports are unknown. For case (c), we propose the combined orthogonal 
matching pursuit algorithm for coefficient recovery and derive the deterministic spar- 
sity threshold under which recovery of the unique, sparsest coefficient vector is possible. 
Simulation results for recovery of combined representations using randomly generated 
dictionaries and coefficients are presented. We show that the basis pursuit algorithm, 
with partial non-negative constraints, and the proposed greedy algorithm perform better 
in recovering the unique sparse representation when compared to their unconstrained 
counterparts. 

Keywords: underdetermined linear system, sparse representations, non-negative 
constraints, orthogonal matching pursuit, unique sparse solution 



1. Introduction 

We investigate the problem of recovering non-negative and combined sparse represen- 
tations from underdetermined linear models. The system of linear equations with the 
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constraint that the solution is non-negative can be expressed as 



y = Xa, where a > 0, 



(1) 



where y G M. AI is the data vector, a G M * is the non-negative solution (coefficient 
vector), and X G ^Afxit^ j g dictionary with > M. When only a part of the 
solution is constrained to be non-negative and the rest is unconstrained (general), we 
obtain the combined representation model, 



Here, the coefficient vector (3 G R x is unconstrained, and X G R x x and D G R MxKd 
are the sub-dictionaries for the non-negative and general representations respectively. We 
denote the combined coefficient vector as 6 = [at T /3 T ] T , and the combined dictionary as 
G = [X D] . We assume that G is overcomplete with K x + K d > M, and the columns 
of the dictionaries are normalized to have unit li norm. The sparsest solutions to (II]) 
and (j2| are obtained by minimizing the £o norm, the number of non-zero elements, of the 
corresponding non-negative coefficient vector, a, or the combined coefficient vector, 6. 
In both the cases, the unique minimum Cq norm solution, when it exists, will be referred 
to as MLq solution. In this paper, we focus on obtaining deterministic guarantees for 
recovery of the ML solutions to the linear systems ([!]) and ([2]), using both convex and 
greedy algorithms, based on the properties of the dictionaries. 

Some of the applications of the non-negative representation model in ([I]), and the 
combined model in ^ are in image inpainting PQ, automatic speech recognition using 
exemplars [2J, protein mass spectrometry [3], astronomical imaging [3], spectroscopy [3], 
source separation [6] , and clustering/ semi-supervised learning of data [8] , to name a 
few. 

1.1. Prior Work 

For the non-negative representation model in (II]), a sufficiently sparse MLq solution 
can be recovered by minimizing the l\ norm of ex., using the non-negative version of the 
basis pursuit (BP) algorithm [U], which we refer to as NN-BP. The optimization program 
can be expressed as 



y = Xa + D/3, where a > 0. 



(2) 



min l T a subject to y = Xa, a > 0. 



(3) 



a 
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The conditions on X under which the recovery of ML solution using ([3]) is possible have 
been derived based on the neighborliness of polytopes [TQl ttH H2] , and the non- negative 
null-space property [13]. A non-negative version of the greedy orthogonal matching pur- 
suit (OMP) algorithm [13], which we will refer to as NN-OMP, for recovering the coeffi- 
cients has also been proposed [To]. If the set 

{a\y = Xcx,a > 0} (4) 

contains only one solution, we can use any variational function instead of the l\ norm in 
order to obtain the unique non-negative solution [TTJ [T2J US] . In particular, the solution 
can be obtained by using the non-negative least squares (NNLS) algorithm [31 [16]. 

A major part of our work investigates the combined sparse representation model 
introduced in where only a part of the sparse coefficient vector is constrained to 
be non- negative. We consider the deterministic sparsity thresholds i.e., the maximum 
number of non- zero coefficients possible in the MLq solution, such that the MLq solution 
can be uniquely recovered. To the best of our knowledge, such an investigation has not 
been reported so far in the literature. However, when both a and (3 are unconstrained 
general sparse vectors, the sparsity thresholds for recovery of the MLq solution have been 
presented in [T71 [18] . By considering the coherence parameters of X and D separately, 
the authors in [T7] show that an improvement up to a factor of two can be achieved in 
the deterministic sparsity threshold when compared to considering X and D together as 
a single dictionary. Note that deterministic sparsity thresholds provide guarantees that 
hold for all sparsity patterns and non-zero values in the coefficient vectors. Probabilistic 
or robust sparsity thresholds, that hold for most sparsity patterns and non-zero values in 
the coefficient vectors have also been derived in [17] . again for the case where ot and (3 are 
general sparse vectors. When this representation is approximately sparse and corrupted 
by additive noise, theory and algorithms for coefficient recovery are presented in [19j . 

1.2. Contributions 

We present deterministic recovery guarantees for both the non-negative and the com- 
bined sparse representation models given by Q and ^ respectively. Furthermore, we 
propose a greedy algorithm for performing coefficient recovery in combined representa- 
tions and derive deterministic sparsity thresholds for unique recovery using l\ minimiza- 
tion and the proposed greedy algorithm. 
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For the non-negative model in ([T]), we derive the sufficient conditions for Q to be 
singleton based on the neighborliness properties of the quotient polytope corresponding 
to the dictionary X. Similar analyses reported in [HI [12] assume that the dictionary X is 
obtained from a random ensemble and append a row of ones to it, such that the row span 
of X contains the vector 1 T . In contrast, we do not assume any randomness on X and 
only require that its row span intersects the positive orthant. We show that the sparsity 
threshold on ex, for the set Q to be singleton, is the same as the deterministic sparsity 
threshold for recovering the ML Q solution of a general sparse representation. When- 
ever this threshold is satisfied, £i-norm regularization in (|3| can be replaced with any 
variational function. Section [2] presents the analysis of the non-negative representation 
model. 

For the combined model in ([2]), we propose a variant of the greedy OMP algorithm, the 
combined OMP (COMB-OMP) algorithm, for performing coefficient recovery. We also 
consider a i\ regularized convex algorithm, which we refer to as combined BP (COMB- 
BP). We derive the deterministic sparsity thresholds for recovering the ML Q solution 
using both the COMB-BP and COMB-OMP algorithms. We show that a factor-of-two 
improvement in the sparsity threshold, observed when ex and (3 are general sparse vectors 
[T7] , holds for recovery using the COMB-BP also. We also show that such an improvement 
in the sparsity threshold cannot be observed using the COMB-OMP algorithm, because 
of the partial non-negativity constraint on the coefficient vector. However, COMB-OMP 
incurs very low computational complexity when compared to COMB-BP. Furthermore, 
we obtain the sparsity thresholds in the following cases of coefficient support knowledge: 
(a) the non-zero support of both ex, f3 are known, and (b) non-zero support of (3 alone 
is known. When analyzing case (b), we factor out the contribution of the general repre- 
sentation component and arrive at conditions under which ^-norm regularization in the 
resulting optimization can be replaced with any variational function for the recovery of 
ex. Section [3] presents all the details in the analysis of the combined representation model. 

The performance of the COMB-BP and the COMB-OMP algorithms are also analyzed 
using simulations. The dictionary G is obtained from a Gaussian ensemble and the non- 
zero coefficients are obtained either from uniform distribution or fixed as random signs 
(±1). It is shown that both COMB-BP and COMB-OMP respectively perform better 

than their unconstrained counterparts, the BP and the OMP, particularly as the K x be- 
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comes larger. We also show that the COMB-OMP incurs substantially less computational 
complexity when compared to the COMB-BP. 

1.3. Notation 

Lowercase boldface letters denote column vectors and uppercase boldface denote ma- 
trices, e.g., a and A denote a vector and a matrix respectively, a, indicates the i th column 
of the matrix A. The Moore-Penrose pseudoinverse of a matrix A, given by (A T A) _1 A T 
is denoted as AL The notation A = diag(a) means that A is a diagonal matrix with the 
elements of the vector a as its diagonal. |A| refers to a matrix whose elements are the 
absolute values of the elements of A and the same notation applies to vectors also. The 
maximum row and maximum column sums of A are referred to as HAH^oo and || A || 1>x 
respectively. A set is denoted as A, its cardinality is given by \A\ and its complement by 
A c . The operator [.]+ returns the maximum of the argument and zero, and max(., .) re- 
turns the maximum of the two arguments. Ik denotes an identity matrix of size K x K, 
1ki,k 2 is a matrix of ones with size K\ x K^- Similar notation will be employed for 
defining vectors also. When it is clear from context, the subscripts will be dropped for 
simplicity. 

2. Non-negative Sparse Representations 

For the non-negative representation given in (jlj), we denote the number of non-zero 
coefficients in ot as S x . 

Definition 1. Q20J) The two-sided coherence (or simply coherence) of the dictionary X 
is 

Ix^x • I 

/i x = max T — % J , (5) 

% +3 ||Xj|| 2 ||X.,||2 

Definition 2. (p2]) The one-sided coherence of the dictionary X is 

I T x I 

o x = max n % ■ (6) 

If the columns of X are normalized, we have \x x = o~ x , and if they had different £2 norms, 
we would have fi x < cr x [151 Lemma 1]. 

Definition 3. ([15]) The dictionary X belongs to the class of matrices denoted as 

if its row span intersects the positive orthant. 
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If X G M + , 3h such that h T X = w T , w > 0. Let us define W = diag(w), U = XW" 1 , 
and denote a u and \i u as the one-sided and two-sided coherences of U respectively. In 
[T2| Theorem 1] it is shown that X G M + is a necessary condition for ^ to be singleton. 
The main result in [TSJ Theorem 2] states that the set Q will be singleton if X G .M + 
and 5^ < 0.5(1 + 1/<t u ). 

We will now state the main result of this section, whose proof will be relegated to the 
end of this section. 

Theorem 2.1. When X G Ai + , the set defined in is singleton if the number of 
non-zero entries in a., S x < 0.5(1 + l/fj, x ). 

The threshold given in the above theorem is better than that of [T51 Theorem 2], because 
= (J> u , < o~u and hence \i x < a u . We are able to improve the threshold by resorting 
to geometric arguments based on the theory of polytopes. The rest of this section will 
state and prove lemmas that will be used in the proof of our main result. 

We will define three geometric entities, the cross-polytope C Kx , the simplex T Kx ~ 1 
and the positive orthant WL + X , that will be used in the proof. The cross-polytope is defined 
as the Ei ball, ||ck||i < 1, in M. Kx , and r f Kx ~ 1 is the standard simplex, the convex hull of 
unit basis vectors. Any general sparse representation with S x non-zero coefficients can 
be successfully recovered using t\ minimization (BP), if the quotient polytope ~XC Kx is 
centrally S x — neighborly J2TJ Theorem 1]. This form of neighborliness implies that any 
set of S x vertices of X.C Kx , not including an antipodal pair (pair of ±Xj), span a face. For 
unique recovery of non-negative S x — sparse vectors using the linear program given in ([3]), 
the condition on the quotient polytope XT is that it must be outwardly S x -neighborly 
PS Theorem 1]. Here, we fix T = T AW if can be expressed as a convex combination 
of the columns of X, else we fix T = Tq x where Tq x is the solid simplex, the convex hull 
of r f Kx ~ 1 and the origin. When every set of S x vertices, not including the origin, span a 
face, the quotient polytope is said to be outwardly ^-neighborly 

Lemma 2.2. When X G M. + and the number of non-zero coefficients in ex is S x , the set 
defined in Q) is singleton if the quotient polytope ~K.T^ X is outwardly S x — neighborly. 

Proof. By assumption, 3h such that h T X = w T , w > 0. Consider the quotient 

polytope UTq^, where U = XW _1 and W = diag(w). Since XT Xa: is outwardly 
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S x — neighborly and the positive scaling of vertices does not affect the neighborliness of 
a polytope, \JT x is also outwardly S x — neighborly. Now denote y = y/(h T y) and 
7 = Wa/ (h T y). The set defined in Q has a one-to-one correspondence with 



{ 7 | y = U 7 ,7>0}, (7) 

If we show that ([7]) is singleton, then Q is singleton as well. 

Since we know that ||q:||q = S x , this implies that ||7||o = S x . Because of the neighbor- 
liness of the quotient polytope U7^ *, y lies in its simplicial face F of affine dimension 
S x . Denote V to be the set of vertices of F. The remaining vertices in the quotient poly- 
tope are denoted by the set V c . Consider an arbitrary vector y c expressed as a convex 
combination of the vertices V c . Since F is a face, there exists a linear functional A^ and 
a constant c such that A^y = c and A]?y c < c |21j . This means that for an arbitrar- 
ily chosen y and y c which are convex combinations of vertices V and V c respectively, 
||y — y c II 2 > 0. By extension, the rays in the directions of y and y c intersect only at the 

origin. The convex cones formed by the vertices V and V c are denoted as UvIR+ ' and 

lv c i -» 
UycM^ respectively. Since ||y — y c ||2 > is true for arbitrary pairs of y and y c , the 

relative interiors of UyR+' and Uv<=M+ ' are disjoint. Therefore, from [221 Theorem 1.32], 

there exists a hyperplane passing through the origin that separates the cones properly. 

From [31 Prop. 1], the existence of such a hyperplane is sufficient for ([7]), and by extension 

Q, to be singleton. 



2. 1 . Proof of Theorem 2. 1 



If S x < 0.5(1 + l//i x ), the quotient polytope ~KC Kx is centrally S^— neighborly 
Corollary 1.1]. Since vertices^ 7 ^) — {0} C verticesf^C^), central S^— neighborliness 
of ~X.C Kx implies outward S x — neighborliness of XT^*. Note that the vertex will be 
neglected when considering the outward neighborliness. Combining the assumption that 
X G from Lemma 2.2, the set defined in ^ is singleton. 



3. Combined Sparse Representations 

We now turn to investigate the problem of combined sparse representations, where 

a part of the coefficient support is constrained to be non-negative. For the combined 

representation model given in pL the number of non-zero coefficients and the coefficient 
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support for a are given by S x and X respectively. For j3, they are respectively denoted 
as Sd and V. Let us define the combined representation vector 8 = [ot T (3 T } T and the 
combined dictionary G = [X D]. The set Q indexes the non-zero coefficients in 8. The 
length of 8 is denoted by K g and its number of non-zero coefficients is referred to as S g . 
We will refer to the coefficient vector 8 as the combined representation, since it contains 
both non-negative and general entries from the coefficient vectors a and (3 respectively. 
We will define the cross-coherence between the matrices X and D as 

|x r d-| 

H g = max j — 1 j . (8) 
{ j iFiblldjlh 

We will present deterministic sparsity thresholds for recovery of the ML solution of @ 
when the coefficient supports are unknown as well as partially known. 

3.1. Non-Zero Supports of at and (3 Known 

The vectors ol\ G WL Sx and f3 l G M. Sd contain the non-zero coefficients of ck and (3 
indexed by X and T> respectively. The matrices Xi and Di contain the columns of X 
and D indexed by the sets X and V respectively. Since the coefficient supports are 
known, we can express ^ as 

y = Xiai + Di/3 1 , (9) 

where cki > 0. We define 8i = [af 0\] T and the matrix Gi = [Xi Di]. Recovery can 
be performed using least squares with inequality constraints (LSI) [231 Chap. 23] as 

min ||y — Gi<5i|| 2 subject to I^^i > 0, (10) 

<5i 



where \% — \J-s x ®s x ,s g ] is the indicator matrix such the constraints I^^i > and 
(Xi > are equivalent. 

If the matrix Gi has full column rank, <5i can be estimated by just using least squares 



(LS) instead of LSI, as the additional constraint in (10) will not impact the solution. The 



following theorem presents a sufficient condition for Gi to be of full column rank. 

Theorem 3.1. (JT8j \TJ^) For the system defined in the matrix Gi = [X x Dj] has 

full column rank if 

s Sa c [i-/i,(^-i)] + [i-M^-i)] + (n) 

n 2 g 



3.2. Non-Zero Support of (3 Alone Known 

We will now consider the case where the non-zero support of (3 given by the set T> is 
known for the system in (|2]). We will derive conditions for unique recovery of a. using 
NN-BP and NNLS. With the knowledge of non-zero support of (3, we can rewrite ^ as 

y = Xa + Di/3 1 . (12) 

Define Px> to be the projection matrix for the subspace orthogonal to the column space 
of Di, i.e., 

P v = l M - DiDj. (13) 



Premultiplying (12) with P©, we get 

P vy = P^Xck where a > 0. (14) 



Let us define y = Px>y and X = P^X, such that ( 14 ) becomes 



y = Xck where a. > 0. (15) 



The condition for recovery of the unique solution a from ( 15 ) using NN-BP is 



S x < 0.5 M + ^ , (16) 

where fix is the coherence of X. 

Lemma 3.2. (fT8^ ) The coherence o/X, given by [i x can be upper bounded as 

„ < o, ( |i ,r M f- i> ;; ( + i + % fe) 2 ) . m 

\fi x [l - /id{S d - 1)J + + SbUy 
The above lemma follows directly from [181 Theorem 5]. This also implies that for the 
existence of D{, we need to have Sd < 1 + 1/fid- 

Lemma 3.3. Let 

{a\y = Xa, a > 0} = {a}. (18) 
For a given non-zero support set V of (3 

{6t\y = Xa,a > 0} = {a} (19) 



holds if (16) is satisfied, and 



3h such that h T X > and h T D! = 0. (20) 
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Proof. From Theorem 2.1 we know that the singleton condition (19) holds true if (a) 
the condition in (16) is satisfied, and (b) 3r such that r T X > 0. Since (18) is true by 



assumption, 3h such that h T X > 0. For h T X > and r T X > to hold together, we 
should have h = P^ 1 "- Therefore, we have h T D x = 0, following the definition of P© in 



(13). 



If the sufficient conditions in Lemma 3.3 are satisfied, NNLS can be used to recover the 



unique solution of (14), for a given non-zero support V of (3. 



3.3. Non-zero Supports of at and (3 are Unknown 

When the supports of at and /3 in ([2]) are unknown, we will first consider the problem 
of recovering the coefficients using the convex program, 



min ||<5||i subject to y = GS, I^S > 0, 
s 



(21) 



which we refer to as COMB-BP. Here, 1^ = [1^ 0K x ,K g ] is an indicator matrix that 



picks out at from the vector S such that the constraint in (21) is equivalent to at > 0. 



When deriving the threshold on S g for the recovery of the MLq solution, without loss of 
generality, we assume that S x < S d and fi x < fi d . Similar thresholds can be derived for 
the other cases also. 



3.3.1. Condition for Recovering the MLq Solution using the Convex Program 
The sufficient condition for the COMB-BP to recover the MLq solution is 

max ||G|gj||i < I. 

ieg c 



(22) 



This condition is same as the one given in [2U Theorem 3.3] for recovery of a general 
sparse vector using BP, since the l\ norm does not depend on the sign of the coefficients. 



From [T71 Theorem 3], the condition (22) can be expressed as 



[I + fi d )(2S x fi d + S d (/j, g + fi d )) + 2S x S d (/j, 2 - /il) < (1 + fi d ) 2 - 



(23) 



The threshold on the total number of non-zero coefficients, S g , is derived using (23), and 
can be found in [T71 Corollary 3]. 
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3.3.2. Condition for Recovering the ML Solution using a Greedy Algorithm 

We propose a greedy pursuit algorithm that can be used to recover the ML® solution 
from ([2]). The proposed COMB-OMP algorithm follows a procedure similar to the OMP 
algorithm [21] and is presented in Table [TJ The stopping criterion for this algorithm 
is either the maximum number of iterations/non-zero coefficients, T, or the £2 norm of 
the residual, e. In the algorithm, ir(i) denotes the correlations computed for the current 
residual with the normalized atom gj. When updating the index set of chosen dictionary 
atoms, Qt-, we consider only the positive maximum correlation for atoms corresponding to 
X and absolute maximum correlation for atoms corresponding to D. This is consistent 
with our combined representation scheme. The solution update can be performed using 
a constrained least squares procedure. The final debiasing step computes the solution 



using the LSI algorithm described in Section 3^ This step will be ignored when deriving 
the sparsity threshold, since it improves the solution only when the sparsity threshold is 
not satisfied and when there is additive noise in the combined model ([2]). 



Table 1: The COMB-OMP Algorithm for Greedy Pursuit 
of a Combined Representation. 

Goal 

Recover the MLq solution from y = G<5 such that 1^8 > 0. 
Input 

y, the input vector. 

G = [X D], the combined dictionary. 

T, the desired number of iterations. 

e, error tolerance. 

Initialization 

- Iteration count, t — 0. 

- Solution, S t = 0. 

- Residual, r t = y — GS t = y. 

- Active coefficient supports, X t = {}, V t = {}, Q t = {}. 

- All coefficient supports, X = V = {i}?J Kx+v Q = XUV. 

- Non-active coefficient supports, X£ = X, T) c t = V, Ql = Q. 
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Algorithm 

Loop while t < T OR ||r t ||2 > e 

- Compute correlations: 

tt(z) = i% for 1< % < K g . 

- Update support: 

i = argmax[7r(i)] + . 

j = argmax |7r(j)|. 
k = argmax([vr(i)] + , 

like X£, then X t+1 = X t U {A;}, else £> m =D ( U {A;}. 
= Xt+i U Pt+i. 

- Update solution: 

St+i = argmin^ ||y — G<5 j| 2 subject to supported) = Gt+i, Ix t 8 > 0. 

- Update residual: r t +i = y — Gdt+i- 

- Update support sets: 

Qt+i — Q — Qt+i, X£ +1 = x — x t+ i, T>i +l = v — v t+1 . 

- Update iteration count: t = t + 1. 

end 

Debias to compute final 8: 

S t = argmin 5 ||y — G<5 1| 2 subject to supported) = Q t , 1^5 > 0. 



The sufficient sparsity threshold on the coefficient vector under which the COMB- 
OMP will recover the MLq solution will be investigated. Some of the strategies used 
in the proofs are inspired by similar techniques used in [T7J HBJ 121] • In order to derive 
the threshold, we will divide the dictionary G = [X D] into four sub-dictionaries Xi G 
R M * S ; X 2 G Dl e R Mxs d and Da G R Mx(K d -s d )_ We assume tria t the 

matrix Gi = [X x D x ] contains the atoms that participate in the representation and 
G2 = [X 2 D 2 ] contains those that do not participate. This implies that the signal y can 
be represented as 

y = X 1 ai + Di/3 1 , (24) 

where the elements of ai G R x are strictly positive and those of (3 1 G M. Sd are non-zero. 
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Lemma 3.4. When the matrix Gi = [Xi Di] has full column rank, y is given by (24), 
and the residual r t of COMB-OMP satisfies 



max(max(Xf r t , 0), ||Dfr t | 



|Gf rt||oo, 



(25) 



the sufficient condition for COMB-OMP to uniquely recover the ML Q solution from 
is 

it. 



max ||Gjgi||i < I. 



(26) 



Proof. See Appendix A 



Note that the sufficient condition (26) given in Lemma 3.4 is the same for a general 



representation also [T7J 24J. However, the important difference in the case of recovery 



using COMB-OMP is that (26) becomes sufficient only when (25) holds. As we will 



see in the following lemmas, this will lead to a significant difference in terms of sparsity 
threshold when compared to COMB-BP. We will first derive conditions under which the 



first step of the COMB-OMP, when y = r , will satisfy (25). This will then be extended 
to the residuals at all steps, r t , where t > 1. 

Lemma 3.5. When the matrix G\ = [Xi Di] has full column rank, and y is given as 



(24), (25) will be satisfied for y = r if 



(S x - l)fi d + S d fi g < -. 



(27) 



Proof. See Appendix B 



The condition given by (27) needs to be satisfied even if there is one non- negative 



component in the combined representation. For now, let us assume that (25) holds for 



all r t , where t > 1, and derive the threshold on S g such that the ML solution can be 
recovered from ([2]). It will be shown later in the section that the threshold on S g obtained 
indeed implies that (p5|) holds for all r t ,t > 1. 



Lemma 3.6. When the matrix Gi has full column rank, and y is given as (24), the 



sufficient condition for (26) to be satisfied is 

Sxf-d + Sdf>g 



1 - (S x fid + S d /j, g - fi d ) 
13 



< 1 



(2f 



Proof. See 



Appendix C 



When (28) is satisfied, (27) holds as well. Since the number of non-zero coefficients, 



S x and Sd, of cki and f3 1 are unknown, we need to derive the condition on recovery that 
depends only on the number of non-zero coefficients of the combined representation S g . 



Lemma 3.7. When the matrix G\ has full column rank, and y is given as (24), the 



sparsity threshold on the combined coefficient vector 5 for ^28y to hold is 

1 



S g < 0.5 ( 1 



(29) 



Proof. See|Appendix D 



A similar procedure to obtain the threshold on S g using (27), results in S g < 0.5(2 + l/// s ) 



which is only slightly better than (29). Moreover, as stated already (28) implies rt27J) and 



not vice- versa. Therefore the sparsity threshold in (29) cannot be made better. We will 



assume that fi g = fid, as obtained in the proof of the above lemma, and show that the 
above bound is sufficient for recovering the subsequent atoms using the residuals r t , when 
t > 1. 



Lemma 3.8. When the matrix Gi has full column rank, and y is given as (24), (25) 



will hold true for residual at any step, r t for t > 1, when fi g = [id and (29) is satisfied 



Proof. See Appendix E 



Now, we are ready to state our main theorem without proof, since it follows directly 
from the lemmas stated in this section. 



Theorem 3.9. For any y that follows the combined model in §2y, COMB-OMP will 
recover the ML solution if the number of non-zero coefficients, S g , is less than 0.5(1 + 

1/flg). 

From the lemmas proved in this section, it is clear that this threshold cannot be made 
better. Contrast this with the case of recovery using a convex program discussed in Sec- 



tion 3.3.1 as well as sparsity thresholds for recovery using BP and OMP when ct and 

(3 are general sparse vectors (TTJ Eqn. (13)]. Figure [l] compares the thresholds on S g 

when fi g = 0.01 and fid varies from to 0.01. We can see that an improvement up to a 

14 



factor of two can be observed with COMB-BP, BP and OMP algorithms when compared 



to COMB-OMP. From the proof of Lemma |3.5[ it can be observed that introducing 
non-negative constraint on even one coefficient in the representation drastically alters 
the deterministic sparsity threshold of greedy OMP-like algorithms. Note that, however, 
the sparsity thresholds are pessimistic and in the experiments provided in the next sec- 
tion, COMB-OMP performs better than OMP and also has much reduced computational 
complexity when compared to COMB-BP and BP. 



4. Experiments 

The COMB-BP and the COMB-OMP algorithms incorporate the prior knowledge 
that a set of coefficients to be recovered are non-negative. If we use BP and OMP algo- 
rithms for recovery, this prior knowledge cannot be exploited. In order to establish that 
this additional information leads to improvement in recovery performance, we performed 
numerical experiments by realizing the elements of dictionary G from an i.i.d. zero-mean, 
unit- variance, Gaussian distributions. The non-zero coefficients in the coefficient vector 6 
are either signs (±1) or realized from a uniform distribution. We varied the proportion of 
the non-negative and the unconstrained coefficients in the combined representation and 
tested the performance of COMB-BP, BP, COMB-OMP and OMP algorithms in exact 
and approximate recovery. For the case of exact recovery, we also compared NN-BP and 
NN-OMP algorithms which impose the constraint that all coefficients are non-negative. 

The total number of atoms in G was fixed at K g = 200. The three cases tested were 
(a) K x = 50, K d = 150, (b) K x = 100, K d = 100, and (c) K x = 150, K d = 50. The loca- 
tion of the non-zero coefficients in 8 were fixed uniformly at random. When the non-zero 
coefficients were realized from a uniform distribution, the non-negative coefficients were 
obtained from the uniform distribution U(0, 1) and the general coefficients were obtained 
from U(— 1,1). When the non-zero coefficients were signs, the general coefficients were 
obtained with equiprobable positive and negative signs. The number of non-zero coeffi- 
cients S x and Sd were varied from 1 to 30 each, and hence the total number of non-zero 
coefficients, S g , varied from 2 to 60. y was obtained using the combined model (J2J), and 
coefficient recovery was performed using the six algorithms. The relative recovery error 
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between the recovered coefficient vector 8 and the actual coefficient vector 8 is 

RRE = 11(5 T ^ll^ (30) 

11*112 

If the RRE is less than 10 -6 , the coefficient is said to be recovered exactly. Each ex- 
periment was repeated 10 times in order to measure the average performance in cases of 
both exact and approximate recovery. The average relative recovery error is the mean of 
RRE over all iterations. 

Let us first consider the case when the coefficients are realized from the uniform distri- 
bution. From Figure [2] it can be seen that as K x increases, the performance of COMB-BP 
and COMB-OMP become increasingly better when compared to that of BP and OMP 
respectively. In particular, the performance of COMB-BP substantially improves as the 
non-negative component in the representation becomes bigger. Note that the recovery 
performance of NN-OMP and NN-BP algorithms do not compare well with the rest of 
the algorithms, since they impose the constraint that all coefficients are non-negative, 
whereas actually only a part of them are. Furthermore, in Figure |3j it can be observed 
that COMB-BP and COMB-OMP algorithms exhibit lesser average RRE when com- 
pared to their unconstrained counterparts. In this case, the gap in performance between 
OMP and COMB-OMP is very pronounced when K x is large. Similar behavior can be 
observed when non-zero coefficients are signs (Figures [4] and |5| but in general the dif- 
ferences in performance of the algorithms are less prominent. The experiments clearly 
show that COMB-BP and COMB-OMP perform better than BP and OMP respectively, 
although the sparsity thresholds derived in the previous section did not point to such an 
improvement. This is because the deterministic sparsity bounds are generally pessimistic. 
Furthermore, the presence of a large non-negative component substantially improves the 
recovery performances of COMB-BP and COMB-OMP. The average RRE using NN- 
OMP and NN-BP algorithms are not shown since they are much higher than those of the 
other algorithms. 

Finally, the average CPU time taken by each of the algorithms to recover a coeffi- 
cient vector is computed and shown in Figure |6j for the case when G is obtained from 
a Gaussian ensemble, the non-zero coefficients are realized from a uniform distribution, 
K x = 100 and = 100. The experiments were performed using MATLAB R2010b on 

a 2.8GHz, Intel i7 desktop. Clearly, COMB-OMP and OMP have much lesser computa- 
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tional complexity when compared to COMB-BP and BP. 



5. Conclusions 

We considered the problem of recovering sparse solutions from a overcomplete lin- 
ear model when the solution vector was constrained to be either completely or partially 
non-negative. When the solution was completely non-negative, based on the theory of 
polytopes, we derived conditions on the dictionary for the existence of a unique solution. 
In the case of combined sparse representations, we considered cases when the coefficient 
support was completely known, partially known or completely unknown. When the co- 
efficient support was completely unknown, we proposed the COMB-OMP algorithm and 
derived the deterministic sparsity threshold that guarantees recovery of the unique, min- 
imum £ norm solution. Experimental results, using dictionaries drawn from a Gaussian 
ensemble and non-zero coefficients realized from a uniform distribution or a equiprobable 
distribution of signs, showed that the COMB-BP and COMB-OMP algorithms perform 
better in terms of exact and approximate recovery compared to their unconstrained coun- 
terparts. A possible direction for future work is to derive probabilistic sparsity thresholds 
for NN-BP and COMB-BP algorithms, under appropriate assumptions on the dictionary 
and the coefficient vectors, that will explain the improved experimental performance of 
sparse recovery algorithms with non-negativity constraints when compared to their un- 
constrained versions. 



Appendix A. Proof of Lemma 3.4 



For COMB-OMP to recover the unique sparsest representation, no atom from G2 must 
enter the support set Gt a t any iteration. Therefore, the residual r t at each iteration t 
must satisfy the condition 

max(max(X| , r t ,0), HD^rJoo) 
P r i = 7 T^r „ \ — n — r < 1- A.l 

Since 

max(max(X^r t ,0), HD^H^) < HG^i-J^, 



17 



p(r t ) can be bounded as 



P0t 



< 



< 



max(max(Xfr i ,0), ||Dfr t ||oo) 

l|Gj(G 1 ) r G^rt || oo 
max(max(X : fri,0), HDfrtUoo) 

ii 00,00 

IIG^frtHoo 
max(max(Xfr t ,0), HDfrJoo) 

II "2 V 1/ II 00,00 

||G{G2||l,l 

max ||Gjg;||i 



■ieg 



(A.2) 
(A.3) 

(A.4) 

(A.5) 
(A.6) 
(A.7) 



Eqn. (A.3) holds since (G{) T Gf is an orthoprojector onto the column space of Gx- Both 
y and G8t lie in the column space of Gi and hence r t lies in the same space. The 
properties of ||.||oo,oo ensures that (A.4) is true. By assumption, the denominator of (A.4) 
equals HGfrJoo. Therefore, (A.5) holds true and (A.6) follows from relation || A.- r || 00i00 = 



A||n for any matrix A. From (A.l), (A.2) and (A.7), the sufficient condition provided 



in (26) is obtained. 



Appendix B. Proof of Lemma 3.5 



For (25) to be satisfied, the sufficient condition is that 



max(Xfr i ,0) = ||Xfr 



1 11 00 • 



(b.i; 



Therefore, we only have to consider the case where an atom from Xi will be picked. Let 
us denote <5i = [ct[ @\] T and z = Xfy = XfGi<$i. We will derive the bounds on the 
maximum positive value, z m , and the minimum negative value, z n , of z. We denote the 
smallest possible lower bound on z m as z m , and the largest possible lower bound on \z n \ 



as z n . The worst-case guarantee for (B.I) to be true is 



Zm, ^ Zn. • 



(B.2) 



Using the fact that 



XfG 1 = [I^ 0] + pCfXi - Ift, XfD 



1 b 
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the correlation vector z can be expressed as 



z = «i + [XfXx - I Sx X^Di]*!. 



(B-3) 



Let us define Ci = [XfXi — Is x XfDi] and the elementwise bounds on the submatrices 



are, 



and X^Di < Ij^ls^sJ. It is clear that the maximum row sum of Ci is 

|| Cl || 00,00 < (S X — + Srf/Zg. 



(B.4) 



In order to derive the smallest lower bound on z m , we will assume that all the coeffi- 



cients in <5i have the same absolute value given by a, and hence, from (B.3), we have 



z m > « - allCiHo^oo 

> a(l - [(S x - l)fi d + S d fx g ]) = z n 



(B.5) 



The required bound on z n can be obtained by setting one element of cxi as a, where 
< a < a, and the absolute value of all the other elements of <5i as a. We now have 



z n > a - allCil 



00,00 ; 



and as a — > 0, 



\z n \ < a[(S x - l)fid + Sd/jg] = z n , 



(B.6) 



which is the largest possible lower bound on z n . Subsituting (B.5) and (B.6) in (B.2) 



results in (27). 



Appendix C. Proof of Lemma 3.6 



The condition for success of COMB-OMP can be written as 



max HGjgiHi < ||(GfGi) 1 max ||Gfgi||i, 



(Cl) 



using the property of ||.||i 1 and the fact that G{ = (GfGi) 1 Gf . 
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In order to compute the lower bound for ||(G^Gi) 1 ||i,i, we first expand the Gramm 
matrix 

Gf G x = I 3g + C, (C.2) 

where 

XfXj. — I$ x X^Di 
DfXx Df Dx - I Sd 

C can be bounded elementwise as 

Hg^S d ,S x Vd(ls d ,S d - IsJ 

Vgls d ,S x Vd(ls d ,S d - 

since fi x < fid by assumption. The maximum column sum of C is bounded as 



ICI < 



< 



I C || 1,1 < (S x - l)fl d + SdUg, 



(C.3) 



since S x < S d . From (C.2), we also observe that 1 1 <ZU 1 1 1,1 < 1, since G^ Gi is strictly 



diagonally dominant, because of the linear independence of the columns of Gi. Using 



(C.2), we can write 



IKGfGO-lvHK^ + Cnki. (C.4) 
The Neumann series J2k(~C) k converges to (I5 + C) _1 , whenever 1 1 C 1 1 1,1 < 1 |25j and 



hence (C.4) can be expressed as 

iKGfGinki 



£(-c)' 

k=l 

00 

< £ iidfc 

k=l 



1.1 



1- C n 



< 



1 — (S x fi d + S d fig — fid) 



(C.5) 



usmc 



ing (C.3). If gi is a vector chosen from X 2 , |Gjgj| < [fi d lg x l^g^-s ] an d hence 



we 



have 



max ||Gf gi||i < S x fi d + S d fi g = (S x + S d )fi d + S d (/i g - fi d ). 
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(C.6) 



If gi is chosen from D 2 , |Gfgj| < {pigl'g ^dXs Y ■ Therefore 



max || G x gi || i < S x fjLg + S d fx d = (S x + S d )fi d + S x (fx g - fi d ). 



(C.7) 



Since S d > S x , among (C.6) and (C.7), we will choose (C.6) as our bound. Substituting 



(C.6) and (C.5) in (C.I), we can obtain (28) as the condition for COMB-OMP to succeed. 



Appendix D. Proof of Lemma 3.7 



Rewriting (28), we obtain 



The threshold on the total number of non-zero coefficients, S g , can be obtained by mini- 
mizing ip(Sd) + S d over S d . The constraint is that S d > 1 since S x < S d and the overall 
representation will have at least one non-zero coefficient. Denoting S to be the sparsity 
threshold, it can be obtained as 

S = mm[S d + if){S d )]. 
s d >v 

Relaxing the constraint that S d is an integer, the minimum will be obtained when \x g = \i d 
and the minimum value is 

^°- 5 ( i+ £>' 

which is the strict upper bound on S„. 



Appendix E. Proof of Lemma 3.8 



Since we assumed that \i g = fi d , we will assume that the overall coherence of Gi is 
l^g and the total number of columns in Gi is S g . We will denote Gi = [G a G&], where 
G a with S a columns contains the atoms already chosen for the representation and G& 
contains Sb = S g — S a atoms, one of which will be chosen by the residual. The residual 
r t will be simply denoted as r for notational convenience and is obtained using a least 
squares procedure as 

r = y-G a 6 a , (E.l) 



where 



Sa = Gly. 
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(E.2) 



Let us denote the correlation vector as 



(E.3) 



Similar to the proof of Lemma 3.5, we are only concerned about recovering the non- 



negative coefficients as it will give us sufficient conditions under which (25) will be satis- 



fied. The bounds on the maximum positive value, z m , and the minimum negative value, 
z n , of z will be derived assuming that all the elements in 8b are constrained to be non- 
negative. In this case, the smallest possible lower bound on z m , given by z m and the 



largest possible lower bound on \z n \ given by z n . For (25) to hold for any r t , where t > 1 



similar to the proof of Lemma 3.5, we need to show that 



Zm. Zn. • 



(E.4) 



We will first expand (E.3) using (E.l) and (E.2) 



z = G^(y - G a G\y) 
= Gl(I-G a G{)G 1 S 1 
= Gl(L - G a Gt)G 6 5 6 , 



(E.5) 
(E.6) 



which is obtained by substituting Gi = [G„ G&] and <5i = [6^ 6j] T in (E.5). Let us 
denote any two distinct columns from G& by gi and gj. Let us also denote the matrix 
Q = Gj (I — G a G\)Gb, which is of size Sb x Sb, and designate its (i, j) th element to be 



qij. The correlation vector in (E.6) can be now expressed as 

z = QSb 

We will compute bounds on the diagonal and off-diagonal elements of Q. Expanding G^, 
we can write 



= \ g J[I-G a (G T a G a riG T a ] gj \ 
< IgfgJ + Igf G^G^GJ^Grg, 



< 



/ &./| + ||G a &l||Q 



(G a G a 



(E.7) 
(E.8) 
(E.9) 



Eqn. (E.8) follows from applying triangle inequality on (E.7) and (E.9) is obtained by 



upper bounding the second term in the right hand side of (E.8). We can express 



\G T agl \\ 2 2 < sy g 
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since the maximum absolute coherence between any two elements in Gi is fi g . Since 
GjG a ) _1 || < l/A m j„(G^G a ), and by Gershgorin's disc theorem [26], Theorem 6.1.1], 



Amm(G^G a ) < [1 - fi g (S a - 1)] + , we can rewrite (E.9) as 



When i = j, we have 



(E.10) 

(E.ll) 
(E.12) 

(E ' 13) 

Eqn. (E.12) follows from applying reverse triangle inequality on (E.ll) and ( |E.13 ) is 
obtained by following steps similar to the derivation of upper bound on \q%j\. The bounds 



l ^ | - /i9+ [l-M^-D] 



\^G a {G T a G a )-^\ 



> 1 



T I 
r: g. 



given by (E.10) and (E.13) are valid only if 1 — fi g (S a — 1) > 0, which can be verified by 



substituting pi g < l/(2S g — 1), from (29), and S a < S g . Therefore, (E.ll) and (E.13) can 
be rewritten as \qij\ < qi, \qu\ > q2, where 

qi = Cxiig 

q 2 = Ci(l - S a fi g ), 

and Ct = (l + fx g )/[l-fx g (S a -l)]. 

We know that the diagonal elements of Q are lower bounded by q± and the off-diagonal 
elements are upper bounded by q 2 . Since there are rows in Q, the smallest lower bound 
on z m is 

z m > aq 2 - a(S b - l)q x = z m (E.14) 

which is obtained when all the elements in 8b are set to a. The required bound on z n is 
obtained by setting one element of 6b corresponding to a positive coefficient as a, where 
< a < a, a approaches zero and all the other values in 8b are set to a. z n can be now 
bounded as z n > aq 2 — ctqi(Sb — 1). As a — > 0, we have 

\z n \ < a(S b - l)q 2 = z n . (E.15) 



Using (E.4), (E.14) and (E.15), we have 

Ms < 



1 



2<S'g S a 2 



which is always satisfied since we know from (29) that /i 9 < l/(2S g — 1) and S a > l- 
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Figure 1: Deterministic sparsity thresholds for COMB-BP, BP, COMB-OMP and OMP with fj, g = 0.01 
in order to recover the MLq solution from Q. 
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Exact recovery performance: K = 50 K = 150 



Exact recovery performance: K = 100 K = 100 
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Exact recovery performance: K = 150 K = 50 
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Figure 2: Exact recovery performance of COMB-BP, BP, COMB-OMP, OMP, NN-OMP and NN-BP 
when G is obtained from a Gaussian ensemble and the non-zero coefficients are realized from a uniform 
distribution, (a) K x = 50 and K d = 150, (b) K x = 100 and K d = 100, (a) K x = 150 and K d = 50. 
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Coefficient recovery error: K = 50 K =150 



Coefficient recovery error: K = 100 K = 100 
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Figure 3: Average relative recovery error of COMB-BP, BP, COMB-OMP and OMP when G is obtained 
from a Gaussian ensemble and the non-zero coefficients are realized from a uniform distribution, (a) 
K x = 50 and K d = 150, (b) K x = 100 and K d = 100, (a) K x = 150 and K d = 50. 
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Exact recovery performance: = 50 K d = 150 Exact recovery performance: = 100 K d = 100 
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Figure 4: Exact recovery performance of COMB-BP, BP, COMB-OMP and OMP when G is obtained 
from a Gaussian ensemble and the non-zero coefficients are signs, (a) K x — 50 and Kd — 150, (b) 
K x = 100 and Kd — 100, (a) K x = 150 and Kd = 50. The legend here is same as that of Figure [2j 
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Coefficient recovery error: K = 50 K =150 



Coefficient recovery error: K = 100 K = 100 
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Figure 5: Average relative recovery error of COMB-BP, BP, COMB-OMP and OMP when G is obtained 
from a Gaussian ensemble and the non-zero coefficients are signs, (a) K x — 50 and Kd = f50, (b) 
K x = 100 and Kd — 100, (a) K x — 150 and Kd = 50. The legend here is same as that of Figure [3j 



30 



1 




I I I I I 

10 20 30 40 50 60 

Number of non-zero coefficients 



Figure 6: Average CPU time in seconds taken by COMB-BP, BP, COMB-OMP and OMP algorithms 
to recover one coefficient, when G is obtained from a Gaussian ensemble and the non-zero coefficients 
are realized from a uniform distribution with K x = 1 00 and K4 = f 00. 
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